One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning

نویسندگان

چکیده

Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those struggle to learn a consistent speech style from different speakers. We observe that it would be much easier specific speaker, which leads authentic movements. Hence, we propose novel framework by exploring correlations between audio visual motions speaker then transferring audio-driven motion fields reference image. Specifically, develop an Audio-Visual Correlation Transformer (AVCT) aims infer represented keypoint based dense input audio. In particular, considering may come identities in deployment, incorporate phonemes represent signals. this manner, our AVCT can inherently generalize spoken other identities. Moreover, as keypoints used speakers, is agnostic against appearances the training thus allows us manipulate images readily. Considering lead motions, field transfer module exploited reduce gap identity reference. Once obtained image, employ image renderer generate its clip. Thanks learned speaking style, method generates vivid Extensive experiments demonstrate synthesized outperform state-of-the-art terms quality lip-sync.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio-visual talking face detection

Talking face detection is important for videoconferencing. However, the detection of the talking face is difficult because of the low resolution of the capturing devices, the informal style of communication and the background sounds. In this paper, we present a novel method for finding the talking face using latent semantic indexing approach. We tested our method on a comprehensive set of home ...

متن کامل

Look Who's Talking: Speaker Detection using Video and Audio Correlation

The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speechreading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned ...

متن کامل

An Audio-Visual Imposture Scenario by Talking Face Animation

With the start of the appearance of PDA’s, handheld PC’s, and mobile telephones that use biometric recognition for user authentication, there is higher demand for automatic non-intrusive voice and face speaker verification systems. Such systems can be embedded in mobile devices to allow biometrically recognized users to sign and send data electronically, and to give their telephone conversation...

متن کامل

Exploiting Audio-visual Correlation in Coding of Talking Head Sequences

TALKING HEAD SEQUENCES Ram R. Rao Georgia Institute of Technology Atlanta, GA 30332 [email protected] Tsuhan Chen AT&T Bell Laboratories Holmdel, NJ 07733 [email protected] ABSTRACT In this paper, we present a novel means for predicting the shape of a person's mouth from the corresponding speech signal and explore applications of this prediction to video coding. One possible application...

متن کامل

One shot learning of simple visual concepts

People can learn visual concepts from just one example, but it remains a mystery how this is accomplished. Many authors have proposed that transferred knowledge from more familiar concepts is a route to one shot learning, but what is the form of this abstract knowledge? One hypothesis is that the sharing of parts is core to one shot learning, and we evaluate this idea in the domain of handwritt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i3.20154